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Abstract 

A generative model is developed for deep (multi-layered) convolutional dictionary 
learning. A novel probabilistic pooling operation is integrated into the deep model, 
yielding efficient bottom-up (pretraining) and top-down (refinement) probabilistic 
learning. Experimental results demonstrate powerful capabilities of the model 
to learn multi-layer features from images, and excellent classification results are 
obtained on the MNIST and Caltech 101 datasets. 


1 Introduction 


We develop a deep generative statistical model, which starts at the highest-level features, and maps 
these through a sequence of layers, until ultimately mapping to the data plane (e.g., an image). The 
feature at a given layer is mapped via a multinomial distribution to one feature in a block of features 
at the layer below (and all other features in the block at the next layer are set to zero). This is 
analogous to the method in Lee et al.] ( |2009| ), in the sense of imposing that there is at most one 
non-zero activation within a pooling block. We use bottom-up pretraining, in which initially we 
sequentially learn parameters of each layer one at a time, from bottom to top, based on the features 
at the layer below. However, in the refinement phase, all model parameters are learned jointly, top- 
down. Each consecutive layer in the model is locally conjugate in a statistical sense, so learning 
model parameters may be readily performed using sampling or variational methods. 


2 Modeling Framework 


Assume N gray-scale images }n=i,Ar, with G ; the images are analyzed jointly 

to learn the convolutional dictionary Specifically consider the model 

K 

X^^^ = 0 (1) 

/c=l 

where * is the convolution operator, 0 denotes the Hadamard (element-wise) product, the elements 
of are in {0,1}, the elements of are real, and represents the residual. 

indicates which shifted version of is used to represent X^’^^ 

Assume an L-layer model, with layer L the top layer, and layer 1 at the bottom, closest to the 
data. In the pretraining stage, the output of layer I is the input to layer / -f 1, after pooling. Layer 
I G {1,..., I/} has Ki dictionary elements, and we have: 

;^(n,Z+l) ^ q ^(n,/cz+i,Z+1)^ j^(n,Z+l) 

^-V-" 

_S(n,fc;,Z) 

The expression may be viewed as a 3D entity, with its ki-th plane defined by a “pooled” 

version of 

The 2D activation map is partitioned into x Uy dimensional contiguous blocks (pooling 

blocks with respect to layer / -f 1 of the model); see the left part of Eigure[2 Associated with each 
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( 3 ) 


1 









Accepted as a workshop contribution at ICLR 2015 



Figure 1: Schematic of the proposed generative process. Left: bottom-up pretraining, right: top-down refine¬ 
ment. (Zoom-in for best visulization and a larger version can be found in the Supplementary Material.) 

block of pixels in is one pixel at layer ki of the relative locations of the pixels 

in are the same as the relative locations of the blocks in Within each block of 

g(n,/cz,z)^ either all UxTiy pixels are zero, or only one pixel is non-zero, with the position of that pixel 
selected stochastically via a multinomial distribution. Each pixel at layer ki of equals the 

largest-amplitude element in the associated block of (i.e., max pooling). 

The learning performed with the top-down generative model (right part of Fig. constitutes a 
refinement of the parameters learned during pretraining, and the excellent initialization constituted 
by the parameters learned during pretraining is key to the subsequent model performance. 

In the refinement phase, we now proceed top down, from to #■ The generative process consti¬ 
tutes and 0 and after convolution is manifested; 

the is now absent at all layers, except layer I = 1, at which the fit to the data is performed. 

Each element of has an associated pooling block in 


3 Experimental Results 

We here apply our model to the MNIST and Ca 

MNIST Dataset Table [T] summaries the clas¬ 
sification results of our model compared with 
some related results, on the MNIST data. The 
second (top) layer features corresponding to 
the refined dictionary are sent to a nonlinear 
suppo rt vector machine (SVM) ( [Chang & Lin| 
[201 Ij ) with Gaussian kernel, in a one-vs-all 
multi-class classifier, with classifier parameters 
tuned via 5-fold cross-validation (no tuning on 1 


101 datasets. 

Table 1: Classification Error of MNIST data 



Methods 

Test error 

6-layer Conv. Net -i- 2-layer Classifier 
-1- elastic distortions [Ciresan et al.[([2011|) 

0.35% 

MCDNN[Ciresan et al.[(2012|) 

0.23% 

SPCNN 

Zeller & Fergus ([201 3[) 

0.47% 

HBP|cl 

[len et al.|(|2013|). 

0.89% 

2-layer cFA -i- 2-layer features 

Ours, 2-layer model -i-1-layer features 

0.42% 


deep feature learning). 


Caltech 101 Dataset We next consider the 
Caltech 101 dataset.Eor Caltech 101 classi- 


ficatio n, we follow the setup in [Yang et al. 
( 2009[ ), selecting 15 and 30 images per cat¬ 
egory for training, and testing on the rest. 
The features of testing images are inferred 
based on the top-layer dictionaries and sent 
to a multi-class SVM; we again use a Gaus¬ 
sian kernel non-linear SVM with parameters 
tuned via cross-validation. Ours and related 
results are summarized in Table |2l 


Table 2: Classification Accuracy Rate of Caltech-101. 


# Training Images per Category 

15 

30 

DN Zeiler et al.[(|: 

2010J 

58.6 % 

66.9% 

CBD] 

N |Lee et al. ( 

2009 


57.7 % 

65.4% 

HBP 

Chen et al. ( 

2013 


58% 

65.7% 

ScSPV 

|Yang et al. 

(200 

b 

67% 

73.2% 

P-FV Seidenari et a 


71.47% 

80.13% 

R-K^VD [Li et al.[(|2013[) 

79% 

83% 

Convnet Zeiler & Fergus ((2014f 

83.8 % 

86.5% 

Ours, 2-layer model -i- 1-layer features 

70.02% 

80.31% 

Ours, 3-layer model -i- 1-layer features 

15 . 24 % 

82.78% 


4 Conclusions 

A deep generative convolutional dictionary-learning model has been developed within a Bayesian 
setting. The proposed framework enjoys efficient bottom-up and top-down probabilistic inference. 
A probabilistic pooling module has been integrated into the model, a key component to developing a 
principled top-down generative model, with efficient learning and inference. Extensive experimental 
results demonstrate the efficacy of the model to learn multi-layered features from images. 
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